智能论文笔记

The Scheduling Job-Set Optimization Problem: A Model-Based Diagnosis Approach

Patrick Rodler , Erich Teppan

分类：人工智能

2020-09-23

公司的一个普遍问题是，产品订单的量有时可能超过生产能力。我们正式介绍了两个新的问题，该问题处理了一个问题，该问题要丢弃或推迟以实现某些（及时性）目标，并尝试通过基于模型的诊断来接触它们。在彻底的分析中，我们确定了诊断问题引入的问题的许多相似之处，但也揭示了关键的特质和概述处理或利用它们的方法。最后，对工业规模的问题实例的概念验证评估来自众所周知的调度基准套件，这表明，基于开箱即用的模型诊断工具可以很好地攻击两个形式化问题之一。

translated by 谷歌翻译

LOSDD: Leave-Out Support Vector Data Description for Outlier Detection

Daniel Boiar , Thomas Liebig , Erich Schubert

分类：机器学习 | (统计)机器学习

2022-12-27

Support Vector Machines have been successfully used for one-class classification (OCSVM, SVDD) when trained on clean data, but they work much worse on dirty data: outliers present in the training data tend to become support vectors, and are hence considered "normal". In this article, we improve the effectiveness to detect outliers in dirty training data with a leave-out strategy: by temporarily omitting one candidate at a time, this point can be judged using the remaining data only. We show that this is more effective at scoring the outlierness of points than using the slack term of existing SVM-based approaches. Identified outliers can then be removed from the data, such that outliers hidden by other outliers can be identified, to reduce the problem of masking. Naively, this approach would require training N individual SVMs (and training $O(N^2)$ SVMs when iteratively removing the worst outliers one at a time), which is prohibitively expensive. We will discuss that only support vectors need to be considered in each step and that by reusing SVM parameters and weights, this incremental retraining can be accelerated substantially. By removing candidates in batches, we can further improve the processing time, although it obviously remains more costly than training a single SVM.

translated by 谷歌翻译

Stop using the elbow criterion for k-means and how to choose the number of clusters instead

Erich Schubert

分类： (统计)机器学习 | 机器学习

2022-12-23

A major challenge when using k-means clustering often is how to choose the parameter k, the number of clusters. In this letter, we want to point out that it is very easy to draw poor conclusions from a common heuristic, the "elbow method". Better alternatives have been known in literature for a long time, and we want to draw attention to some of these easy to use options, that often perform better. This letter is a call to stop using the elbow method altogether, because it severely lacks theoretic support, and we want to encourage educators to discuss the problems of the method -- if introducing it in class at all -- and teach alternatives instead, while researchers and reviewers should reject conclusions drawn from the elbow method.

translated by 谷歌翻译

Comparison of Data Representations and Machine Learning Architectures for User Identification on Arbitrary Motion Sequences

Christian Schell , Andreas Hotho , Marc Erich Latoschik

分类：机器学习

2022-10-02

Reliable and robust user identification and authentication are important and often necessary requirements for many digital services. It becomes paramount in social virtual reality (VR) to ensure trust, specifically in digital encounters with lifelike realistic-looking avatars as faithful replications of real persons. Recent research has shown that the movements of users in extended reality (XR) systems carry user-specific information and can thus be used to verify their identities. This article compares three different potential encodings of the motion data from head and hands (scene-relative, body-relative, and body-relative velocities), and the performances of five different machine learning architectures (random forest, multi-layer perceptron, fully recurrent neural network, long-short term memory, gated recurrent unit). We use the publicly available dataset "Talking with Hands" and publish all code to allow reproducibility and to provide baselines for future work. After hyperparameter optimization, the combination of a long-short term memory architecture and body-relative data outperformed competing combinations: the model correctly identifies any of the 34 subjects with an accuracy of 100% within 150 seconds. Altogether, our approach provides an effective foundation for behaviometric-based identification and authentication to guide researchers and practitioners. Data and code are published under https://go.uniwue.de/58w1r.

translated by 谷歌翻译

Clustering by Direct Optimization of the Medoid Silhouette

Lars Lenssen , Erich Schubert

分类：机器学习 | (统计)机器学习

2022-09-26

聚类结果的评估很困难，高度依赖于评估的数据集和情人的观点。有许多不同的聚类质量度量，试图提供一般度量以验证聚类结果。一个非常流行的措施是轮廓。我们讨论轮廓的有效基于MEDOI的变体，对其性质进行理论分析，并为直接优化提供两个快速版本。我们将原始轮廓中的想法与著名的PAM算法及其最新改进的想法相结合。其中一个版本保证了与原始变体相等的结果，并提供了$ O（k^2）$的运行加速。在有关30000个样品和$ k $ = 100的真实数据实验中，我们观察到10464 $ \ times $速度与原始的Pammedsil算法相比。

translated by 谷歌翻译

On Projections to Linear Subspaces

Erik Thordsen , Erich Schubert

分类：机器学习 | (统计)机器学习

2022-09-26

将数据投射到线性子空间上的优点是从缩小尺寸降低中众所周知的。已经对子空间预测的最大保留（主要组件分析）的一个关键方面进行了彻底研究，并且随机线性投影对诸如固有维度之类的措施的影响仍然是一项持续的努力。在本文中，我们研究了较少探索的线性投影深度，这些尺寸的显式子空间以及随之而来的方差期望。结果是欧几里得距离和内部产品的新界限。我们展示了这些边界的质量，并研究了与内在维度估计的紧密关系。

translated by 谷歌翻译

The State of Sparse Training in Deep Reinforcement Learning

Laura Graesser , Utku Evci , Erich Elsen , Pablo Samuel Castro

分类：机器学习 | 人工智能

2022-06-17

近年来，稀疏神经网络的使用迅速增长，尤其是在计算机视觉中。它们的吸引力在很大程度上源于培训和存储所需的参数数量以及学习效率的提高。有些令人惊讶的是，很少有努力探索他们在深度强化学习中的使用（DRL）。在这项工作中，我们进行了系统的调查，以在各种DRL代理和环境上应用许多现有的稀疏培训技术。我们的结果证实了计算机视觉域中稀疏训练的发现 - 稀疏网络在DRL域中对相同的参数计数的稀疏网络表现更好。我们提供了有关DRL中各种组件如何受到稀疏网络的影响的详细分析，并通过建议有希望的途径提高稀疏训练方法的有效性以及推进其在DRL中的使用来结论。

translated by 谷歌翻译

Challenges of sampling and how phylogenetic comparative methods help: With a case study of the Pama-Nyungan laminal contrast

Jayden L. Macklin-Cordes , Erich R. Round

分类：自然语言处理

2022-01-01

系统发育比较方法在我们的领域是新的，并且对于大多数语言学家来说，至少有一点谜团。然而，导致他们在比较生物学中发现的道路与平衡抽样的方法论历史如此类似，这只是一个历史的事故，即他们没有被典型的专家发现。在这里，我们澄清了系统发育比较方法背后的基本逻辑及其对重点采样的深刻智力传统的基本相关性。然后我们介绍将在日常类型的研究中使用类型的概念，方法和工具，使类型学家能够在日常类型的研究中使用这些方法。系统发育比较方法和平衡采样的关键共性是他们试图因系谱而应对统计非独立性。虽然采样永远不会实现独立性，但需要大多数比较数据被丢弃，系统发育比较方法在保留和使用所有数据的同时实现独立性。我们讨论了系统发育信号的基本概念;关于树木的不确定性;典型的类型学平均值和比例对族谱敏感;跨语言家庭的比较;和体现的影响。广泛的补充材料说明了实际分析的计算工具，我们说明了与帕马尼云根腭膜对比的类型学案例研究讨论的方法。

translated by 谷歌翻译

Evolution and trade-off dynamics of functional load

Erich Round , Rikker Dockum , Robin J. Ryder

分类：自然语言处理

2021-12-22

功能负载（FL）通过口碑对与lexicon制作的区别的贡献来定量贡献。以前的研究与声音变化有特别低的曲线。在这里，我们将探究范围扩大到FL，以其所有价值观的演变。我们应用系统发育方法，以检查澳大利亚帕玛尼蒙（PN）家族的90种语言的FL的历复演变。我们在FL中发现了高度的系统发育信号。虽然已经报告了系统发育信号进行语音结构，例如语音术，但其在语音功能测量中的检测是新颖的。我们还在元音长度和以下辅音的FL之间发现了一个重要的负相关，即深入的历史权衡动态，我们与现代PN语言中的已知阿拉孔和过去的补偿声音变化相关。该发现揭示了一种类似于翻蛋白的历史动态，我们作为音韵子系统之间的对比流动。我们的发现在跨越整个大陆和多千年的语言系列中，我们的发现提供了Sapir'漂移'假设的最具令人讨厌的例子之一，在历史相关的语言中不小心平行的发展。

translated by 谷歌翻译

Step-unrolled Denoising Autoencoders for Text Generation

Nikolay Savinov , Junyoung Chung , Mikolaj Binkowski , Erich Elsen , Aaron van den Oord

分类：自然语言处理 | 机器学习

2021-12-13

在本文中，我们提出了一种新的生成模型，逐步逐步的去噪AutoEncoder（Sundae），不依赖于自回归模型。类似地与去噪扩散技术，在从随机输入开始并从随机输入开始并每次直到收敛改善它们时，日出施加Sundae。我们提出了一个简单的新改进运算符，它比扩散方法更少迭代，同时在定性地在自然语言数据集上产生更好的样本。Sundae在WMT'14英语到德语翻译任务上实现最先进的结果（非自回归方法），在巨大清洁的常见爬网数据集和Python代码的数据集上对无条件语言建模的良好定性结果来自GitHub。通过在模板中填充任意空白模式，Sundae的非自动增加性质开辟了超出左右提示的可能性。

translated by 谷歌翻译